Read the raw data from an online github respository (https://github.com/nytimes/covid-19-data). It daily updates the latest real-time data of COVID-19 cases and deaths in the USA by compiling data from the New York Time.
#install packages
library(RCurl)
library(dplyr)
library(ggplot2)
library(shadowtext)
library(plotly)
a <- read.csv(text=getURL("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv"),header=T)
Set the baseline as 100 cases for each state. Set time variable as the number of days since the state reached the 100th cases
aa <- a %>%
as_tibble %>%
filter(cases > 100) %>%
group_by(state) %>%
mutate(days_since_100 = as.numeric(as.Date(date) - min(as.Date(date)))) %>%
ungroup
#Set the points of case numbers on y-axis
breaks = c(100,200,500,1000,2000,5000,10000,20000,50000,100000,200000,500000,1000000)
p <- ggplot(aa, aes(days_since_100, cases, color = state)) +
geom_line(size = 0.8) +
geom_point(pch = 21, size = 1) +
scale_y_log10(expand = expansion(add = c(0,0.1)),
breaks = breaks, labels = breaks) +
scale_x_continuous(expand = expansion(add = c(0,1))) +
theme_minimal(base_size = 14) +
theme(
panel.grid.minor = element_blank(),
legend.position = "none",
plot.margin = margin(3,15,3,3,"mm")
) +
coord_cartesian(clip = "off") +
labs(x = "Number of days since 100th case", y= "",
subtitle = "Total number of COVID-19 cases in USA by state") +
geom_shadowtext(aes(label = paste0("", state)), hjust=0, vjust=0,
data = .%>% group_by(state) %>% top_n(1, days_since_100),
bg.color = "white")
p
fig <- ggplotly(p)
fig
Now I want to know the case fatality rate (CFR) by calculating deaths divided by cases in each state. I create this new variable. Then I want to see which state or area have the maximum, median, and minimum cases among the US. I extract the rows of this data include the latest case and death counts, and CFR. Finally I put them into a table.
#create the new data with the latest case and death counts, and CFR
s <- a %>%
group_by(state) %>%
summarise(latest_cases = max(cases, na.rm=T),
latest_deaths = max(deaths, na.rm=T),
latest_CFR = paste0(round(latest_deaths*100/latest_cases, digits=2),'%'))
#Put the 3 rows into a table
summary_table <-
s[c(which.max(s$latest_cases),
which(s$latest_cases==median(s$latest_cases)),
which.min(s$latest_cases)), ]
summary_table$cases_level <- c('max cases','median cases','min cases')
summary_table <- summary_table[,c(1,5,2:4)]
knitr::kable(summary_table, row.names(c('max cases','median cases','min cases')),align = 'c')
| state | cases_level | latest_cases | latest_deaths | latest_CFR |
|---|---|---|---|---|
| California | max cases | 817939 | 15791 | 1.93% |
| Arkansas | median cases | 82755 | 1350 | 1.63% |
| Northern Mariana Islands | min cases | 69 | 2 | 2.9% |
California has the most cumulative cases with 817939 cases so far. It’s case fatality rate is 1.93%. Northern Mariana Islands has the least cumulative cases with 69 cases so far. It’s case fatality rate is 2.9%.